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Abstract 

UNIX  is  often  said  to  be  a  poor  real-time  system,  but  rarely  are  its  weaknesses  identified. 
In  this  paper,  we  describe  some  of  UNIX's  real-time  problems  in  the  context  of  millisecond  level 
control  applications.  We  find  that  such  problems  are  due  to  both  the  system  interface  and  the 
implementation. 

We  also  describe  how  some  of  these  problems  were  solved  in  the  SAGE  operating  system,  a 
small  system  specifically  designed  for  such  control  applications  Although  SAGE  is  not  a  UNIX 
system,  it  has  many  similarities,  and  hence  many  of  the  solutions  can  be  applied  to  UNIX. 

1      Introduction 

In  a  time-sharing  environment,  wh»  re  response  lime  on  the  order  of  seconds  can  be  tolerated,  UNIX 
performs  ably.  But  in  a  supervisory  control  environment,  where  a  number  of  devices  and  their 
controlling  processes  must  be  serviced  within  milliseconds,  UNIX  does  not  work  well.  Indeed,  neither 
System  V  nor  Berkeley  UNIX  can  reliably  support  a  real-time  application  with  millisecond  time 
constraints,  even  though  the  underlying  hardware  can  provide  the  needed  computational  power. 
Nevertheless,  there  are  several  reasons  why  real-time  capabilities  should  be  added  to  UNIX: 

•  UNIX  already  supports  many  of  the  facilities  needed  for  writing  sophisticated  real-time  appli- 
cations. Besides  a  rich  development  environment,  it  has  a  simple  system  interface,  runs  on  a 
wide  range  of  hardware,  and  supports  many  different  communication  disciplines. 

•  Other  commercially  available  real-time  systems  also  tend  to  have  shortcomings.  Many  are 
missing  a  significant  number  of  facilities  that  UNIX  provides,  e.g.  [1],  while  others  run  only 
on  vendor  specific  hardware,  e.g.  [2].  In  most  cases,  software  problems  cannot  be  overcome 
because  source  code  is  either  unobtainable  or  prohibitively  expensive. 

•  By  using  the  same  system  for  program  development  and  real-time  control,  both  software 
development  and  hardware  costs  can  be  reduced. 

•  Time-sharing  applications  will  also  benefit  from  the  additional  real-lime  capabilities,  since  the 
system  will  have  greater  functionality  and  performance. 

In  this  paper,  we  describe  why  existing  UNIX  system  interfaces  and  implementation  techniques 
cannot  reliably  support  real-time  applications  with  millisecond  time  constraints.  We  also  describe 
how  some  of  these  problems  were  resolved  in  SAGE,  a  real-lime  operating  system  designed  specifically 
for  supervisory  control  applications.  Although  SAGE  is  not  a  UNIX  system,  it  has  a  similar  internal 
structure  and  emulates  many  of  the  UNIX  system  calls.  Therefore,  SAGE  and  UNIX  are  quite 
similar  at  both  the  kernel  and  programmer  levels,  and  many  of  the  techniques  used  in  SAGE  can  be 
directly  applied  to  UNIX. 


2      UNIX  Deficiencies 

In  this  section  we  describe  some  of  the  deficiencies  that  arise  in  both  the  UNIX  system  interface 
and  implementation.  For  the  most  part  we  consider  only  the  two  major  variants  of  UNIX:  Berkeley 
4.3  BSD  and  System  V  UNIX.  However,  since  most  UNIX  systems  are  ported  from  one  of  these  two 
bases,  the  discussion  should  be  relevant  to  most  other  UNIX  systems  as  well. 

2.1      Interface  Problems 

Interface  problems  are  due  to  either  inadequate,  incomplete,  or  missmg  functionality  in  the  system 
facilities  available  to  the  application.  For  real-time  applications  running  under  UNIX,  interface  prob- 
lems arise  m  many  areas,  including  the  scheduler,  timer,  memory  management,  and  IPC  facilities. 

2.1.1  Process  Scheduling 

In  UNIX,  process  priority  is  set  through  eitlivr  the  nice  (System  \')  or  setpriority  (Berkeley) 
system  call.  Unfortunately,  neither  call  guaraniees  that  the  process  with  the  highest  assigned  priority 
will  be  running  at  any  given  instant,  since  the  system's  round-robin  scheduling  algorithm  preempts 
processes  that  have  been  using  extensive  amounts  of  CPU  time. 

This  presents  a  problem  for  many  real-time  applications,  which  are  structured  on  a  fixed  priority 
basis.  Typically,  the  most  time-critical  processes  are  (statically)  eissigned  the  highest  priorities,  wiili 
the  understanding  that  at  any  instant  the  scheduler  should  run  the  highest  priority  process  that  is 
ready.  For  such  real-time  applications,  UNIX  must  provide  some  range  of  priorities  that  are  not 
subject  to  round-robin  preemption. 

Another  useful  facility  the  system  can  provide  is  to  allow  one  process  to  change  the  priority  of 
another  process  Then  a  sophisticated  application  could  conveniently  perform  its  own  scheduling 
(such  as  run  the  process  with  the  nearest  deadline  first).  Unfortunately,  there  is  no  way  to  change  the 
priority  of  another  process  in  System  V  UNIX.  Note,  however,  that  the  Berkeley  UNIX  setpriority 
call  does  provide  this  capability. 

2.1.2  Timer  Facilities 

As  a  rule  of  thumb,  timer  services  should  provide  a  resolution  of  at  least  two  orders  of  magnitude 
smaller  than  the  period  at  which  events  occur,  to  avoid  quantization  error.  Thus,  for  example,  if 
events  occur  on  the  order  of  seconds  (such  as  in  a  general  time-sharing  system),  system  time  services 
should  be  accurate  to  roughly  1/100  of  a  second.  Likewise,  when  dealing  with  real-time  events  on 
the  order  of  milliseconds,  the  timers  should  be  accurate  to  tens  of  microseconds. 

Unfortunately,  UNIX  timers  are  much  too  coarse  for  real-time  applications.  System  \'  UNI.X, 
for  instance,  only  provides  for  second  granularity  on  alarms  and  time  of  day,  and  (using  streams) 
millisecond  resolution  for  polling  and  sleep  functions 

These  problems  are  corrected  by  the  Berkeley  UNIX  interface,  which  specifies  all  time  values  with 
microsecond  resolution.  In  reality,  however,  implementation  problems  (to  be  described  later)  reduce 
the  efTective  resolution  of  the  Berkeley  timers  to  hundreths  of  a  second,  which  again  is  inadequate 
for  real-time. 

UNIX  timer  facilities  are  also  lacking  some  needed  functionality.  Because  the  timer  values  arc 
interpreted  relative  to  the  current  time,  there  is  no  way  to  atomically  perform  such  operations  as 
"sleep  until  a  given  time."  Thus  all  timer  events  are  subject  to  an  unpredictable  amount  of  clock 
skew,  which  is  unacceptable  if  events  should  be  generated  at  specific  times.  Berkeley  UNIX  addresses 
this  problem  in  a  limited  way,  by  providing  a  second  "repeat"  argument  to  the  setitimer  command. 
But  it  IS  still  impossible  to  generate  a  single  event  without  skew,  or  to  generate  multiple  events  with 
non-uniform  periods. 


2.1.3  Memory  Management 

Tlie  memory  management  facilities  provided  by  UNIX  are  inadequate  for  many  real-time  applica- 
tions. The  most  glaring  deficiency,  perhaps,  occurs  in  Berkeley  UNIX,  where  there  is  no  way  to  lock 
a  process  into  memory.  Therefore,  a  page  fault  taken  at  an  inopportune  moment  could  ejisily  cause 
a  process  to  miss  a  real-time  deadline.  System  \'  UNIX  addresses  this  problem  with  a  system  call 
that  can  lock  an  entire  program  text  or  data  segment.  This  is  adequate  but  wasteful,  smce  only 
certain  pages  generally  need  to  be  locked. 

For  a  finer  grain  of  memory  locking  control  in  a  demand  paged  real-time  system,  two  system 
calls  seem  appropriate.  The  first  returns  the  status  for  a  range  of  pages  (locked,  in-core,  etc.);  the 
second  forces  the  system  to  bring  (and  perhaps  lock)  these  pages  into  core.  Similar  calls  have  been 
proposed  for  other  UNIX  systems  [5,6]. 

Shared  memory  is  another  real-time  facility  that  is  missing  in  Berkeley  UNIX.  Although  the 
relative  merits  of  shared  memory  (as  opposed  to  message  passing)  have  been  debated  for  many  years, 
shared  memory  can  often  provide  significantly  faster  (indeed,  optimal)  inter-process  communication 
in  many  ccises.  , 

Of  course,  shared  memory  is  generally  useful  for  non  real-time  applications  as  well,  e.g.  to 
provide  shared  libraries.  However,  the  main  reason  for  providing  shared  memory  is  performance.  In 
this  way,  the  system  can  accommodate  real-time  applications  that  push  the  limits  of  the  underlying 
hardware.  Fortunately,  many  UNIX  systems  (including  System  V  UNIX)  have  implemented  some 
form  of  shared  memory. 

For  similar  reasons,  memory  mapped  I/O  is  also  useful  for  real-time  applications,  since  it  allows  a 
process  to  efficiently  interface  to  other  devices  and  computers  on  a  shared  bus.  Memory  mapped  I/O 
is  also  attractive  since  it  allows  devices  to  be  programmed  without  resorting  to  kernel  modifications 
or  special  purpose  device  drivers.  Neither  System  V  nor  Berkeley  UNIX  provides  such  a  memory 
mapping  capability,  although  the  Berkeley  UNIX  interface  does  propose  an  nunap  system  call  which 
can  handle  memory  mapped  I/O.  Indeed,  because  of  its  utility,  minap  has  been  partially  implemented 
in  several  vendor  supplied  systems. 

Even  with  memory  mapped  I/O,  no  existing  UNIX  system  provides  an  application  level  facility 
to  map  bus  addresses  to  process  memory  (needed  for  DMA  applications)  or  allows  the  user  to 
supply  a  per-application  interrupt  handler.  Therefore,  the  value  of  memory  mapped  I/O  is  greatly 
diminished,  and  device  control  typically  remains  relegated  to  kernel  device  drivers.  In  turn,  this 
causes  the  application  to  incur  greater  overhead  and  reduces  device  programmability  by  fixing  the 
system  interface. 

2.1.4  User-Level  Synchronization 

Existing  UNIX  systems  provide  process  synchronization  through  either  message  passing  or  (Sys- 
tem V)  semaphores.  Both  mechanisms  have  fairly  high  overhead,  however.  Message  passing  implies 
at  least  two  system  calls  (for  reading  and  writing  the  message)  and  a  context  switch  to  the  monitor 
process  for  every  synchronization  operation.  System  V  semaphores  have  better  performance,  but 
still  require  at  least  one  system  call  for  every  synchronization  operation. 

Inefficient  synchronization  primitives  greatly  diminish  the  performance  benefits  of  shared  mem- 
ory, since  short  critical  sections  are  frequently  performed  when  accessing  shared  variables.  Therefore, 
more  efficient  synchronization  methods  must  be  provided  by  a  real-time  UNIX. 

Synchronization,  like  many  performance  problems,  can  be  viewed  as  either  an  interface  problem 
or  an  implementation  problem:  it  is  an  interface  problem  if  we  plan  to  add  new  mechanisms,  but  it  is 
an  implementation  problem  if  we  plan  to  speed  up  existing  mechanisms.  Indeed  for  many  problems, 
both  the  interface  and  the  implementation  will  have  to  be  modified  to  achieve  the  desired  level  of 
performance. 

2.1.5  Serial  Lines 

Since  IlS-232  is  an  ubiquitous  device  interface,  a  real-time  system  should  be  able  to  support  a  number 
of  high  speed  tty  lines  efRciently.  One  major  problem  with  the  UNIX  tty  driver  interface  is  the  high 


overhead  incurred  running  in  "raw"  mode.  Because  the  interface  returns  as  soon  as  characters  are 
present,  tiie  process  rapidly  context  switches  between  kernel  and  user  mode. 

To  avoid  context  switches,  the  usual  workaround  is  to  delay  for  an  appropriate  period  before 
reading  from  the  line.  In  general,  however,  this  solution  either  encounters  needless  delays  (by 
waiting  too  long)  or  generates  extra  context  switches  (by  wailing  too  little).  Another  approach  is  to 
construct  a  special  purpose  kernel  line  discipline  such  as  used  for  Berknet  or  SLIP  in  4.3  BSD,  but 
once  again  this  does  not  address  the  general  problem. 

Another  problem  with  the  serial  line  interface  is  that  it  does  not  support  the  full  range  of  UART 
capabilities;  parity,  stop  bits,  variable  data  bits,  etc.  For  example,  UNIX  does  not  support  eight  bit 
data  with  parity,  a  configuration  which  is  becoming  more  common. 

2.1.G      Interprocess  Communication 

Current  UNIX  systems  provide  a  variety  of  IPC  facilities,  including  signals,  pipes,  sockets  (Berkeley), 
named  pipes,  and  messages  (System  \').  These  different  styles  of  communication  are  particularly 
useful  for  providing  client-server  based  applications. 

Although  the  client-server  model  is  a  useful  one  for  structuring  distributed  and  modular  appli- 
cations, a  potential  problem  arises  for  real-time  applications.  Because  the  client's  service  request 
is  executed  in  the  context  of  the  server  process,  the  client's  priority  effectively  becomes  that  of  the 
server.  If  the  server  and  client  have  different  priorities,  the  request  will  either  assume  a  greater  or 
lesser  importance  when  processed  by  the  server.  Usually,  however,  we  want  the  server  to  run  at  the 
same  priority  as  the  client 

This  example  points  out  that  the  system  should  be  able  to  propagate  priorities  across  commu- 
nication streams;  neither  Berkeley  nor  System  V  UNIX  addresses  this  issue.  \N'hile  it  is  possible  for 
mucli  of  this  work  to  be  done  by  the  application  (by  passing  priorities  in  messages),  some  amount 
of  system  support  still  seems  required.  For  instance,  the  system  needs  to  be  able  to  recognize  when 
a  high  priority  request  for  the  server  is  pending,  in  order  that  the  process  be  scheduled  promptly. 

In  the  distributed  environment  the  problem  is  more  pervasive,  since  it  extends  beyond  the  operat- 
ing system  to  the  underlying  communication  subnet;  communications  resources  should  be  allocated 
on  a  priority  basis,  and  the  associated  protocols  must  recognize  different  classes  of  service.  Unfor- 
tunately such  priority  service  is  rarely  provided  in  existing  networks. 

2.2     IiTiplementation 

The  main  implementation  problems  in  UNIX  are  due  to; 

•  Latency  Effects.  In  many  cases,  there  can  be  a  significant  amount  of  delay  between  the  time  a 
process  is  supposed  to  run  and  wlien  it  actually  does  run,  potentially  causing  real-time  deadline 
to  be  missed. 

•  Intolerable  Overhead.  Some  facilities  are  so  costly  in  terms  of  time  that  they  cannot  be  used. 
This  renders  the  facility  useless. 

•  Partial  Implementations.  Although  the  system  service  should  theoretically  provide  the  neces- 
sary functionality,  it  has  not  been  completely  implemented. 

We  now  describe  some  of  the  more  severe  implementation  problems. 

2.2.1      Process  Latency 

The  two  major  causes  of  process  latency  in  UNIX  are  the  non-preemptibility  of  the  kernel  and  the 
lengthy  amount  of  time  spent  in  the  interrupt  handlers.  Because  the  kernel  is  non-preemptible,  a 
newly  readied  higher  priority  process  must  wait  until  an  active  kernel  process  completes  a  system 
call  or  voluntarily  gives  up  the  processor  Since  system  calls  such  as  read  and  write  often  perform 
block  copies  of  several  kilobytes,  these  delays  can  amount  to  milliseconds. 

Interrupt  handlers  or  code  that  disables  hardware  interrupts  also  contributes  to  latency  in  two 
ways.  First,  rescheduling  is  delayed,  since  rescheduling  events  are  effeclively  disabled  while  processor 


priority  is  raised.  Secondly,  an  interrupt  handier  steals  cycles  from  the  currently  executing  process. 
Therefore  a  process  may  not  receive  enough  time  to  be  able  to  meet  a  real-time  deadline.  Lengthy 
interrupt  handlers  are  especially  common  in  both  the  Berkeley  networking  and  System  V  streams 
code,  where  checksums  and  block  copies  are  done  at  software  interrupt  level. 

Clearly  to  avoid  latency  effects,  kernel  processes  need  to  be  preemptible,  and  interrupt  routines 
need  to  queue  work  for  later  processing  by  kernel-resident  preemptible  processes. 

2.2.2  Signal  Latency 

Signal  delivery  experiences  latency  effects  similar  to  process  scheduling,  since  once  a  process  is 
running  inside  the  kernel,  it  only  checks  for  signals  before  it  returns  to  user  mode  or  (for  interruptible 
kernel  sleeps)  when  it  gives  up  the  processor. 

Since  signals  simulate  software  interrupts,  they  are  useful  for  communicating  exceptional  condi- 
tions. However,  if  such  a  facility  is  going  to  be  used  in  real-time  applications,  the  implementation 
must  allow  processes  to  respond  to  such  signals  immediately.  In  particular,  when  a  signal  is  delivered 
to  a  process  running  inside  the  kernel,  the  kernel  stack  needs  to  be  unwound.  Therefore  the  kernel 
must  have  some  exception  handling  capability. 

2.2.3  Timer  Problems 

As  mentioned  before,  Berkeley  UNIX  allows  interval  timers  to  be  specified  in  microseconds.  In- 
ternally, however,  such  values  are  rounded  up  to  the  nearest  tick  of  the  line  clock,  so  the  effective 
resolution  of  the  timer  is  really  the  line  clock  frequency.  In  Berkeley  UNIX  the  line  clock  frequency 
is  1/100  of  a  second,  much  too  coarse  for  the  real-time  domain. 

Profiling  time  values  generally  suffer  the  same  resolution  problem,  since  these  counters  are  only 
updated  from  the  line  clock  interrupt.  Indeed  in  most  UNIX  implementations,  time  of  day  is  also  only 
accurate  to  the  nearest  clock  tick  (although  the  4.3  BSD  VAX  distribution  does  return  reasonably 
accurate  values). 

2.2.4  Copy  Overhead 

In  any  real-time  system,  reducing  overhead  is  an  important  goal.  In  UNIX,  one  major  cause  of  system 
overhead  is  the  time  spent  copying  data  back  and  forth  between  the  kernel  and  user  address  spaces. 
In  particular,  it  takes  roughly  a  millisecond  to  copy  a  page  on  a  standard  32-bit  microprocessor.  In 
many  cases,  however,  these  copies  can  be  avoided  by  mapping  the  data  pages  into  and  out  of  the 
kernel.  Systems  such  as  Mach  have  shown  that  such  an  approach  is  feasible  and  implementable  in 
UNIX  [5]. 

Unfortunately,  the  Mach  scheme  may  cause  problems  for  real-time  applications.  When  the  same 
page  is  mapped  into  two  different  address  spaces,  and  one  process  attempts  to  modify  the  page,  a 
page  fault  is  generated,  and  a  copy  of  the  page  is  made.  In  many  cases,  this  would  cause  the  process 
to  experience  unacceptable  latencies.  Ideally,  the  system  should  avoid  copying  data  where  possible, 
while  at  the  same  time  insuring  that  processes  cannot  generate  unpredictable  page  faults. 

2.2.5  Queueing  Delays 

Queues  are  useful  in  real-time  systems,  since  they  increase  the  potential  concurrency  of  the  system. 
However,  because  most  of  the  internally  maintained  system  queues  obey  a  FIFO  queueing  discipline, 
a  queued  request  can  experience  unpredictable  delays  before  it  is  serviced.  Therefore,  there  is  no 
guarantee  that  such  queued  requests  will  be  performed  in  a  timely  fashion.  This  is  especially  true 
when  performing  I/O  on  heavily  shared  devices  such  as  network  interfaces  and  disk  drives. 

To  eliminate  unwanted  queueing  delays,  the  UNIX  kernel  should  be  modified  to  support  internal 
queueing  disciplines  based  on  process  priority.  In  other  words,  a  priority  should  be  associated  with 
each  message  in  the  queue,  equal  to  the  priority  of  the  process  that  generated  the  message.  If  the 
system  then  services  the  queue  in  priority  order,  the  highest  priority  requests  will  always  be  handled 
in  a  timely  fashion. 


3      SAGE  Overview 

Many  of  the  UNIX  problems  we  have  just  described  have  been  directly  addressed  in  SAGE,  a  real- 
time operating  system  that  is  being  developed  for  robotics  control  applications. 

The  SAGE  system  architecture  is  quite  similar  to  that  of  the  BLIT  or  DMD.  Programs  are 
developed  and  cross-compiled  on  a  UNIX  workstation,  and  then  dynamically  downloaded  to  the 
SAGE  host,  which  consists  of  a  resident  kernel  ruiming  on  a  68000  processor  board.  SAGE  programs 
invoke  operating  system  services  by  trapping  to  the  executive. 

The  SAGE  kernel  provides  many  of  the  facilities  needed  for  real-time  supervisory  control:  multi- 
tasking, a  preemptive  scheduler,  precise  timing  facilities,  and  a  number  of  supported  devices.  More 
novel  features  of  the  system  include  extensive  use  of  memory  management  facilities  and  support  of 
real-time  network  communications,  things  that  are  rarely  found  in  other  real-time  systems. 

Many  aspects  of  the  SAGE  design  were  based  on  previous  experiences  with  NRTX,  a  real-time 
operating  system  developed  at  AT(LT  Bell  Laboratories  [7].  In  fad,  the  original  SAGE  kernel  was 
bootstrapped  from  the  NRTX  development  tools,  and  the  two  systems  have  a  similar  architecture. 
However,  the  current  SAGE  kernel  bears  little  resemblance  to  NRTX. 

3.1      Design  Goals 

The  main  design  goals  of  SAGE  were  to: 

•  Support  real-time  supervisory  control  applications  Such  applications  require  millisecond  re- 
sponse time,  and  must  interface  to  a  diverse  number  of  devices  and  systems.  Therefore  tlie 
kernel  must  be  responsive  and  provide  extensive  I/O  capabilities. 

•  Provide  a  friendly  development  environment.  SAGE  was  intended  to  be  used  for  research  in 
robotics  control.  Therefore,  good  system  support  for  program  development  and  debugging  was 
required. 

•  Maintain  a  large  degree  of  UNIX  compatibility.  UNIX  was  already  a  familiar  and  friendly 
environment,  and  a  degree  of  compatibility  would  allow  us  to  avoid  learning  a  entirely  new 
system.  Furthermore,  modules  could  be  partially  debugged  in  the  UNIX  environment. 

Perhaps  the  most  important  design  goal  was  an  implicit  requirement:  that  a  working  system  be 
ready  quickly  for  use  in  other  research  projects  This  led  to  incorporating  only  those  facilities 
essential  to  our  real-time  applications,  and  making  every  effort  to  take  advantage  of  existing  code 
where  possible,  especially  since  the  development  of  the  system  was  a  one-man  effort. 

The  desire  for  a  quick  implementation  is  reflected  in  two  ways.  First,  by  relegating  program 
development  functions  to  the  UNIX  host,  the  SAGE  kernel  can  be  made  considerably  simpler. 
Secondly,  SAGE  is  internally  structured  in  a  similar  fashion  to  UNIX.  This  has  allowed  us  to  rapidly 
develop  a  working  system,  since  a  large  amount  of  UNIX  code  could  be  reused  in  SAGE.  In  fact, 
the  device  structure  entries  of  both  systems  are  nearly  identical,  so  standard  Multibus  and  VMEbus 
UNIX  device  drivers  only  need  minor  changes  to  run  under  SAGE. 

Although  SAGE  is  similar  in  many  ways  to  UNIX,  SAGE  is  most  definitely  not  a  UNIX  system. 
At  the  system  interface  level,  for  example,  process  creation  is  handled  completely  differently  (there  is 
no  equivalent  of  fork,  for  example),  and  all  processes  share  the  same  virtual  address  space  (although 
one  process,  in  general,  cannot  overwrite  another  process's  address  space).  Also,  a  few  major 
subsystems,  such  as  filesystem  support,  have  been  removed.  Instead,  SAGE  provides  file  operations 
through  remote  procedure  calls  to  a  UNIX  system  serving  as  a  fileserver 

Furthermore,  the  SAGE  kernel  does  not  support  the  standard  UNIX  user  interface.  Neither  the 
shell,  nor  most  of  the  standard  UNIX  utilities,  run  under  the  SAGE  kernel.  Rather,  the  SAGE 
kernel  is  intended  solely  to  provide  a  good  execution  environment  for  real-time  programs,  with  most 
program  development  and  user  interface  issues  handled  through  the  UNIX  machine. 


4      SAGE  Facilities 

We  now  describe  a  few  of  the  real-time  facilities  provided  by  SAGE  that  are  not  in  standard  UNIX 
systems,  including  the  real-time  scheduler,  extended  timer  facilities,  process  synchronization  primi- 
tives, and  memory  management  facilities. 

4.1  Scheduler 

The  SAGE  scheduler  always  runs  the  highest  priority  ready  process,  and  it  will  not  adjust  process 
priorities  dynamically.  In  particular,  the  scheduler  does  not  provide  round-robin  preemption,  and  so 
is  much  simpler  than  the  UNIX  scheduler.  In  spite  of  this  simplicity,  the  SAGE  scheduler  is  better 
suited  for  real-time  applications  than  the  UNIX  scheduler  because  processes  cannot  be  unexpectedly 
preempted.  Therefore,  the  most  time  critical  operations  are  guaranteed  to  be  run  first. 

Process  priorities  are  specified  by  64  bit  integers.  By  default,  SAGE  processes  execute  at  the 
processor's  base  priority,  so  all  interrupt  handlers  effectively  have  a  higher  priority  than  any  process. 
However,  a  SAGE  process  can  elevate  its  proct'.r.^or  priority  as  needed  to  mask  out  interrupts. 

SAGE  also  allows  one  process  to  modify  the  priority  of  another  (known)  process.  This  allows  an 
application  to  efficaciously  schedule  a  group  of  related  processes. 

4.2  Process  Latency 

The  single  most  important  goal  in  the  implementation  of  SAGE  was  to  minimize  process  scheduling 
latency.  Therefore,  it  was  important  to  maximize  the  amount  of  preemptible  kernel  code.  It  is 
mostly  for  this  reason  that  the  internal  structure  of  SAGE  differs  from  that  of  a  standard  UNIX 
system,  where  the  kernel  is  non-preemptible. 

Two  techniques  were  used  to  make  the  kernel  preemptible.  First,  all  critical  sections  of  the  kernel 
were  identified  and  suitably  protected  against  rescheduling  interrupts.  This  was  tedious  but  straight 
forward,  and  has  been  done  in  many  multiprocessor  UNIX  systems. 

Secondly,  every  effort  was  made  to  minimize  the  amount  of  time  spent  in  the  kernel  interrupt 
handlers.  Where  ever  possible,  the  interrupt  handlers  were  written  to  queue  work  for  later  execution 
rather  than  directly  executing  the  code.  In  some  cases  callouts  (described  below)  were  used  to 
execute  subroutines  at  software  interrupt  level,  while  in  other  cases  kernel  resident  processes  were 
used  to  perform  preemptible  lengthy  operations. 

Queueing  work  in  the  interrupt  handler  also  reduced  the  need  to  protect  many  critical  sections 
against  hardware  interrupts,  since  the  shared  data  was  now  being  accessed  by  code  running  at  a  lower 
priority.  Therefore  the  overall  system  latency  was  once  again  reduced,  since  the  critical  sections  only 
needed  to  maisk  out  lower  priority  interrupts. 

4.2.1  Callouts 

To  support  the  SAGE  style  of  callouts,  the  UNIX-style  callout  code  had  to  be  extended  to  accept 
multiple  arguments,  allow  untimed  callout  events  to  be  queued,  and  provide  separate  queues  for 
timed  callouts  waiting  to  expire  and  callouts  that  need  to  be  executed.  The  hardware  clock  routine 
then  consists  mostly  of  moving  expired  timer  events  from  the  wait  queue  to  the  execute  queue,  and 
generating  a  software  interrupt. 

In  general,  SAGE  uses  a  callout  for  executing  short  subroutines  that  cause  a  process  state  tran- 
sition. However,  because  callouts  currently  run  from  a  software  interrupt  handler,  they  are  non- 
preemptible.  Therefore  for  lengthier  operations,  SAGE  uses  a  kernel  process  instead. 

4.2.2  Kernel  Processes 

Kernel  processes  in  SAGE  are  functionally  equivalent  to  the  kernel-half  of  a  UNIX  process.  In 
particular,  a  kernel  process  occupies  a  slot  in  the  process  table  and  has  its  own  stack.  Because  the 
kernel  is  mapped  into  each  process's  address  space,  a  context  switch  to  a  kernel  process  requires  little 
more  than  saving  and  restoring  its  registers.  Therefore  a  kernel  process  is  extremely  lightweight. 


SAGE  uses  kernel  processes  to  implement  many  of  the  network  input  modules,  including  those 
which  support  the  Internet  protocols  IP.  UDP,  and  ARP.  In  this  way,  the  system  allows  lengthy 
network  operations  such  as  checksums  and  packet  reassembly  to  be  preempted. 

In  addition,  kernel  processes  are  used  by  some  device  drivers  to  avoid  extensive  copy  operations 
at  interrupt  time.  For  example,  the  device  driver  for  a  non-DMA  ethernet  controller  uses  two 
kernel  processes,  one  for  receiving  and  one  for  transmitting.  Tiie  receiver  process  copies  packets 
out  of  the  controller's  on-board  memory  into  a  system  buffer,  while  the  transmitter  process  copies 
buffers  from  higher  level  protocols  into  device  memory.  SAGE  also  insures  that  the  more  important 
communication  streams  are  serviced  first,  because  the  higher  level  protocols  are  careful  to  queue 
buffers  for  the  driver  in  priority  order. 

4.3  Tinier  Facilities 

SAGE  extends  the  UNIX  timer  facilities  to  provide  both  relative  and  absolute  timers,  the  main  ones 
being: 

pauseabsolute  pause  a  process  until  a  certain  time 

pauserelative  pause  a  process  for  a  given  amount  of  time 

alarmabsolute  send  an  alarm  signal  at  a  certain  time 

alarmrelative  send  an  alarm  signal  after  a  given  amount  of  time 

gettimeof day  return  the  current  time 

settitneofday  set  the  current  time 

In  addition,  a  few  profiling  and  instrumentation  timers  are  provided. 

All  time  arguments  are  specifie>  to  microsecond  resolution.  Absolute  lime  values  are  specified 
in  terms  of  the  number  of  microseconds  elapsed  since  the  system  has  booted.  In  addition,  the  kernel 
maintains  a  global  time  offset,  which  when  added  to  the  absolute  time,  gives  the  current  Greenwich 
Mean  Time.  This  offset,  along  with  the  absolute  time,  is  returned  by  the  gettimeof  day  call.  Tiie 
offset  can  also  be  changed  by  the  sett imeof day  call. 

To  achieve  high  resolution  on  the  alarm  and  time  of  day  functions,  SAGE  uses  two  hardware 
timers,  with  one  timer  providing  time  of  day  information,  and  the  other  timer  providing  an  event 
tinier  interrupt.  Conceptually  then,  the  time  of  day  function  is  provided  by  simply  readmg  the  time 
of  day  register  In  practice,  however,  the  register  is  a  counter  clocked  at  a  fast  frequency,  and  so 
can  wrap  around  quickly.  Thus  we  must  also  maintain  an  additional  counter  in  kernel  memory  to 
record  the  overflow. 

Internally,  all  alarm  events  are  stored  on  the  callout  wait  queue,  sorted  in  increasing  absolute 
time.  The  interval  timer  is  then  programmed  to  interrupt  at  a  time  given  by  the  first  queue  entry. 
The  hardware  clock  handler  removes  all  expired  entries  from  the  queue  (as  determined  by  a  simple 
comparison),  and  reprograms  the  interval  timer  to  interrupt  at  the  next  timer  event  given  by  the 
new  entry  at  the  head  of  the  wait  queue 

In  the  current  implementation,  an  AMD  9513  timer  chip  is  used,  and  both  the  time  of  day 
and  interval  timers  are  accurate  to  better  than  5  microseconds.  However,  system  call  and  context 
switching  overhead  effectively  increase  this  value  an  order  of  magnitude  ats  far  as  the  application  is 
concerned.  Furthermore,  because  the  number  of  hardware  timers  is  limited,  profiling  timers  are  still 
performed  by  a  "line-clock"  routine  called  at  a  configuration-dependent  frequency,  which  is  usually 
set  at  50  IIZ.  Therefore,  the  profiling  timers  still  have  very  coarse  resolution. 

4.4  User-Level  Synchronization 

SAGE  provides  two  basic  facilities  for  process  synchronization:  a  locking  facility,  whereby  processes 
can  temporarily  raise  their  priority,  and  a  scheduling  facility,  whereby  processes  can  suspend  and 
resume  themselves.  Tiie  facilities  are  intended  to  allow  application  dependent  synchronization  prim- 
itives to  be  built  with  a  minimum  of  system  overhead.  Variations  of  these  calls  were  first  proposed 
in  [10]. 


1 ock( shared var} 
{ 

oordl  =  31;   /*  non-preemptible  »/ 

while  (testandset(sharedvar)  ==  TRUE) 

continue;  /»  multiprocessor  busy-wait  •/ 
} 

unlock(shciredvar) 

{ 

sharedvcir  =  FALSE;   /*  multiprocessor  */ 

Bordl  =  0; 

if  (word2) 

reschedule () ; 
} 

Figure  1;  Lock  and  unlock 

4.4.1  Locking 

A  test-and-set  operation  is  often  used  in  conjunction  with  busy-waiting  to  insure  that  small  critical 
sections  are  atomically  executed.  For  a  fixed  priority  scheduler,  however,  such  a  technique  by  itself 
will  not  work.  This  is  easily  seen  in  the  case  where  a  process,  running  on  a  uniprocessor,  has  been 
preempted  inside  a  critical  section.  If  a  higher  priority  process  now  attempts  to  enter  the  critical 
section,  the  busy-wait  will  always  fail,  and  both  processes  will  deadlock 

What  is  really  needed  is  a  locking  facility,  whereby  a  process  can  insure  it  is  not  preempted 
once  inside  a  critical  section.  In  SAGE,  this  facility  is  provided  by  two  words  (herein  denoted  as 
Hordl  and  word2)  that  are  shared  between  the  process  and  the  kernel  scheduler  wordl  is  set  by 
the  process,  and  is  interpreted  by  the  scheduler  as  the  process's  temporary  priority  By  setting  this 
priority  to  a  suitably  high  value,  the  process  can  insure  it  will  not  be  preempted.  Hord2  is  set  by 
the  scheduler,  and  informs  the  process  that  other  processes  are  waiting  to  run. 

Figure  1  shows  in  detail  how  a  SAGE  process  can  lock  and  unlock  a  critical  section  in  a  mul- 
tiprocessor environment.  When  the  process  wants  to  enter  a  critical  section,  it  calls  lock,  which 
temporarily  raises  the  process's  priority  by  setting  wordl.  Likewise  when  leaving  the  critical  section, 
the  process  calls  unlock,  which  restores  the  process's  base  priority  by  clearing  wordl.  Instructions 
involving  sharedvar  are  used  to  implement  a  multiprocessor  busy-wait,  which  can  no  longer  deadlock 
because  the  process  holding  the  lock  cannot  be  preempted.  Of  course,  these  particular  instructions 
can  be  eliminated  in  a  uniprocessor  system. 

The  scheduler  performs  a  complimentary  action.  If  the  scheduler  is  about  to  preempt  the  process, 
it  checks  wordl,  which  is  the  process's  temporary  priority.  If  the  temporary  priority  is  still  too  low, 
the  process  is  preempted.  However,  if  the  temporary  priority  is  now  high  enough,  the  process  is 
allowed  to  continue,  and  word2  is  set  to  indicate  that  preemption  has  been  deferred,  unlock  will 
eventually  check  if  uord2  is  set,  and  if  it  is,  request  that  the  processor  be  rescheduled. 

In  the  usual  case,  neither  rescheduling  nor  busy-waiting  is  required,  and  only  a  few  instructions 
are  required  to  lock  and  unlock  a  critical  section.  In  particular,  system  calls  are  avoided. 

4.4.2  Scheduling  Control 

Two  primitives  are  provided  for  scheduling  control  which  are  analogous  to  the  kernel's  sleep  and 
wakeup  mechanism: 


P(sein) 
{ 


V(sem) 
{ 


lock (sem .mutex) ; 

while  (sem. value  ==  BUSY)  { 

cnqueueCsem. queue ,  getpidO); 

unlock(sem .mutex)  ; 

suspendproc(getpid() ,   0); 

lock  (sem.iDUtex)  ; 

} 

sem. value  =   BUSY; 

unlock (sem. mutei) ; 


lock(sem.mutex) ;  , 

pid  =  dequeue (sem. queue) ; 
sem. value  =    FREE; 
unlock(sem .mutex) ; 
If    (pid) 

rcsumeproc (pid) ; 


Figure  2    Binary  semaphore 

suspendprocCpid,   flag)      suspend  process  pid  if  flag    '=   0  or  flag  ==   0 

and  the  RESUME.CALLED  flag  has  not  been 
set  for  process  pid  In  any  case,  clear  the  RE> 
SUME.CALLED  flag  for  pid 

resumeprocCpid)  resume  pid  if  it   is  suspended,   otiierwise  set   the 

RESUME.CALLED  flag  for  pid 

Buspendproc  and  resumeproc  can  be  used  witii  lock  and  unlock  to  implement  a  wide  range  of 
synchronization  prinulives  For  instance,  a  binar\  semapliorc  can  be  implemented  a5  in  Figure  2. 
Note  that  tiie  RESUME.CALLED  flag  associated  with  suspendproc  and  resumeproc  is  used  here 
in  an  essential  way,  since  anotiier  process  can  call  resumeproc  in  between  the  time  a  process  has 
placed  itself  on  sem, queue  and  called  suspendproc 

Again,  this  code  is  quite  fast  in  liie  usual  non-blocking  case. 

4.5      Memory  Management 

SAGE  attempts  to  provide  several  memory  management  facilities  needed  for  real-time  work,  includ- 
ing shared  memory  and  memory  mapped  1/0.  At  the  same  lime,  other  generally  useful  facilities,  such 
as  swapping  or  demand  paging,  have  not  been  implemented  The  net  result  is  that  the  SAGE  virtual 
memory  system  is  radically  diflerent  from  that  of  UNIX  in  both  Us  interface  and  implemenlai.on. 

4.5.1      Segments 

SAGE  memory  management  operations  center  around  the  notion  of  a  segment,  which  is  simply  a 
range  of  virtual  addresses  Segments  are  referred  to  by  UNIX  style  pathnames,  and  can  be  mapped 
into  a  process's  address  space  using  the  open  system  call: 

open("/dev/seg/naae",  flags); 

where  the  name  part  of  the  pathname  identifies  the  particular  segment  Similarly,  a  segment  can  be 
created  by  using  the  O.CRCAT  flag  in  open 
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Shared  memory  is  provided  by  having  several  processes  open  the  same  segment  simultaneously. 
The  open  call  also  returns  a  handle  to  the  segment,  which  can  be  used  in  future  ioctl  calls  for 
performing  control  operations. 

The  first  control  operation  that  must  be  performed  on  a  segment  is  virtual  memory  allocation. 
Usually  the  application  just  specifies  the  desired  segment  size,  and  the  system  selects  the  segment's 
virtual  addresses.  However,  an  application  allocating  virtual  memory  can  force  the  virtual  addresses 
to  reside  at  a  particular  location.  In  this  way,  SAGE  can  accommodate  ROM  and  other  statically- 
bound  address  programs. 

After  virtual  addresses  have  been  allocated,  the  segment  consists  of  a  set  of  invalidated  virtual 
pages.  In  particular,  no  physical  memory  has  yet  been  allocated  to  the  segment.  Therefore,  another 
ioctl  call  is  required  before  a  process  can  reference  the  segment's  addresses.  Usually  this  additional 
call  allocates  physical  memory  for  the  segment.  SAGE  implements  this  call  by  mapping  the  segment's 
virtual  addresses  to  valid  physical  pages. 

Other  operations  allow  more  complicated  mappings  but  are  performed  similarly.  The  map-out 
operation  allows  a  segment  to  reference  bus  memory  or  I/O  space,  and  thus  provides  memory  mapped 
I/O.  The  map-in  operation,  on  the  other  hand,  allows  segment  addresses  to  be  referenced  by  other 
DMA  bus  masters. 

Another  control  operation  allows  the  process  to  set  segment  protections  such  as  read-only,  read- 
write,  etc.  The  typical  process  will  create  at  least  three  segments  for  itself:  a  read-only  text  segment, 
a  read-write  data  segment,  and  a  read-write  stack  segment.  Segment  protection,  combined  with  the 
ability  to  invalidate  individual  pages,  allows  a  SAGE  process  to  set  firewalls  in  its  address  space. 

4.5.2  Address  Sharing 

Ail  SAGE  segments  currently  allocate  virtual  memory  from  the  same  pool.  In  general  this  is  quite 
safe,  since  part  of  the  process's  context  is  a  list  of  mapped  in  segments.  The  system  makes  sure  that 
a  process  can  only  access  its  mapped  in  segments. 

Because  all  processes  siiare  the  same  virtual  addresses,  there  is  no  context-dependent  addressing. 
In  other  words,  a  virtual  address  always  refers  to  the  same  physical  location,  independent  of  the 
process  that  dereferences  the  address.  Therefore,  the  kernel,  by  mapping  all  segments  into  its  address 
space,  can  always  reference  any  process  without  changing  any  hardware  MMU  maps. 

Context-independent  addressing  allows  the  system  to  support  user-supplied  interrupt  iiandlers 
in  a  simple  way.  A  system  call  is  provided  to  set  an  interrupt  vector  to  a  handler  inside  the  process's 
address  space.  Because  the  kernel  can  always  access  every  process,  the  interrupt  handler  can  always 
access  its  shared  variables,  regardless  of  which  process  is  currently  running. 

SAGE  also  provides  two  other  system  calls  to  support  user-supplied  interrupt  handlers.  The  first 
allows  a  process  to  modify  the  processor  status  word,  in  order  to  mask  interrupts.  The  second  call 
allows  a  process  to  resume  a  process  that  Iieis  suspended  itself  awaiting  an  event. 

One  drawback  of  address  sharing  is  that  a  process  cannot  always  anticipate  what  virtual  addresses 
it  will  occupy.  Therefore,  the  executable  image  either  has  to  be  relocated  before  it  is  loaded  into 
memory,  or  it  has  to  be  written  using  position  independent  code.  Currently,  relocation  on  demand 
is  performed  in  a  similar  way  to  that  done  in  the  BLIT  or  DMD  systems. 

4.5.3  Page  Copies 

Page  mapping  is  also  used  by  SAGE  to  avoid  copying  data  in  and  out  of  the  kernel.  To  do  this, 
SAGE  provides  special  versions  of  the  read  and  write  system  calls,  pgswapread  and  pgswapwrite, 
that  swap  pages  with  the  kernel  instead  of  copying  data  between  address  spaces. 

When  performing  a  pgswapread  call,  the  kernel  maps  the  pages  containing  data  into  the  process's 
address  space.  The  overwritten  pages  in  the  process's  address  space  are  released  back  to  the  kernel. 

The  pgsHapwrite  call  works  similarly.  The  pages  in  the  process  containing  the  data  to  be  written 
are  remapped  into  the  kernel's  address  space.  The  original  pages  in  the  process  are  then  replaced 
with  new  physical  pages  allocated  by  the  kernel. 

Since  neither  of  these  calls  preserve  copy  semantics,  they  are  much  simpler  to  implement  than 
the  "lazy  evaluation"  techniques  used  in  Accent  [11]  and  Mach  [5].  However,  because  the  SAGE  calls 
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Function SAGE SUN  3/160  (3.2^ 

Syslein  Call  50-80  usees  120  usees 

Null  Process  1  msec  12  msec 

UDP  4K  writes      367K  bytes/sec      374K  bytes/sec 
Context  Switch      60  usees 

Table  1;  SAGE  vs.  UNIX 

are  destructive,  they  are  somewhat  less  convenient  to  use  tlian  the  Mach  equivalents.  Neverllieless. 
the  SAGE  calls  have  proved  useful  for  real-time  work,  since  copying  can  still  be  avoided  in  most 
cases,  and  page  faults  are  guaranteed  not  be  taken  at  inappropriate  times. 

4.6      Serial  Lines 

In  order  to  improve  support  for  high  speed  serial  lines,  several  extensions  were  made  to  the  Berkeley 
UNIX  tty  driver.  First,  the  SAGE  driver  now  allows  character  bufTenng  capacity  and  water  marks 
to  be  specified  on  a  per-line  basis.  Next,  the  driver  has  been  expanded  (hardware  permitting)  to 
support  most  UART  configurations  Finally,  the  SAGE  driver  has  added  a  "block  mode"  discipline. 
In  block  mode,  a  read  will  return  only  when  the  given  number  of  characters  are  read,  or  when  one 
of  a  given  set  of  characters  is  read.  Any  eight-bit  cliaracter  can  be  a  member  of  this  set.  which  is 
specified  by  a  bit  string. 

5  Current  Status 

SAGE  IS  now  running  on  several  different  Pacific  Microsystems  68000-based  processor  boards,  in- 
cluding both  Multibus  and  X'MEbus  systems.  The  essential  hardware  components  used  by  SAGE 
include  the  memory  management  unit  (for  protection  and  memory  mapped  I/O),  dual  ported  mem- 
ory, two  high  resolution  timers,  and  on-board  software  interrupts.  These  components  are  found  in 
most  minicomputers  and  many  microcomputers. 

Extensive  benchmarks  have  yet  to  be  performed  for  the  SAGE  system  However,  preliminary 
figures  illustrated  in  Table  1  show  SAGE"s  performance  is  competitive  with  UNIX  systems  for 
common  functions.  Here  both  SAGE  and  the  SUN  are  using  68020  processors  clocked  at  IG.TMllz. 

Of  course,  the  benchmarks  should  not  be  taken  too  seriously,  since  the  systems  are  totally 
different  in  many  respects.  Indeed,  SAGEs  only  performance  goal  has  been  to  support  the  type 
of  supervisory  control  applications  i>erformed  in  our  laboratory.  In  this  regard,  SAGE  has  been 
successful.  In  particular,  one  68020-based  system  simultaneously  handles  several  thousand  interrupts 
a  second  and  a  number  of  active  network  connections,  all  while  still  providing  real-time  response  on 
the  order  of  milliseconds. 

6  Conclusion 

Interestingly  enough,  many  of  UNIX's  real-time  problems,  such  as  providing  shared  memory,  ePRcienl 
synchronization,  and  minimal  latency,  are  now  being  dealt  with  in  multiprocessor  UNIX  implemen- 
tations. This  is  probably  due  to  the  common  desire  to  provide  high  performance.  In  addition, 
several  commercially  available  UNIX  ports  have  also  addressed  some  of  these  real-time  issues  For 
instance,  MASSCOMP  [8]  provides  page  locking,  shared  memory,  memory-mapped  I/O,  a  real-lime 
scheduler,  and  some  degree  of  kernel  preemption. 

The  author  knows  of  no  UNIX  system,  however,  which  (running  on  compar.iMi-  hardware)  can 
handle  the  type  of  applications  currently  handled  by  SAGE.  In  particular,  little  attention  has  been 
paid  in  UNIX  to  reducing  interrupt  latency  or  providing  network  communications  compatible  with 
real-lime  goals.  Hopefully,  future  UNIX  systems  will  be  able  to  handle  such  applications. 
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Many  of  the  idecis  appearing  in  SAGE  were  inspired  by  work  done  elsewhere.  The  user-level 
synchronization  techniques  are  similar  to  those  in  Mach  [9]  and  tlie  NYU  Ultracompuler.  Page 
mapping  techniques  were  first  made  prominent  in  Accent.  Many  of  the  ideas  for  timers  and  callouts 
came  from  VMS.  The  SAGE  segment,  process,  and  IPC  interface  was  inspired  by  Version  8  UNIX 
[12],  which  allows  objects  to  be  referred  to  by  pathnames  and  accessed  through  the  open  system 
call.  User-level  interrupt  handlers  were  taken  from  NRTX. 

In  the  future,  we  hope  to  port  SAGE  to  several  other  types  of  processor  boards  and  to  build  a 
multiprocessor  system.  We  would  also  like  the  extend  the  kernel  in  several  ways,  including  adding 
the  ability  to  handle  signals  and  exceptions  in  a  timely  fashion. 
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